- By: Ali Salehi
- Copilot: ChatGPT
Todo:
- implement wavelet transform from scratch
- explain filters in wavelet
0. Introduction¶
Overview of Image Compression¶
In today’s digital age, the proliferation of high-resolution images and videos has led to an increasing demand for efficient storage and transmission methods. Image compression plays a crucial role in reducing the file size of images without significantly compromising quality. This enables faster transmission over networks, efficient storage utilization, and improved performance in image processing applications.
Traditional image compression methods like JPEG have been widely used due to their simplicity and effectiveness. However, with advancements in technology and the need for higher compression ratios and better image quality, more sophisticated algorithms have been developed. This evolution has led to the creation of JPEG 2000, a more advanced image compression standard that overcomes many limitations of its predecessors (still an old algorithm).
What is JPEG 2000?¶
JPEG 2000 is an image compression standard and coding system created by the Joint Photographic Experts Group (JPEG) committee in the year 2000. It was designed to provide a flexible and efficient method for compressing images, offering both lossy and lossless compression within a single unified framework.
Key Features and Advantages¶
- Wavelet-Based Compression: Unlike JPEG, which uses the Discrete Cosine Transform (DCT), JPEG 2000 employs the Discrete Wavelet Transform (DWT), leading to better compression efficiency and image quality, especially at high compression ratios.
- Scalability: Supports progressive decoding by pixel accuracy and resolution, enabling applications to display lower-quality versions of an image while higher-quality data is still being transmitted.
- Region of Interest (ROI) Coding: Allows certain parts of an image to be coded with higher quality than the rest, which is beneficial in applications where specific areas need to be emphasized.
- Error Resilience: Enhanced robustness to data transmission errors, making it suitable for unreliable network environments.
- Lossless and Lossy Compression: Provides both compression methods in a single algorithm, eliminating the need for separate processes.
Applications and Use Cases¶
- Medical Imaging: High-quality lossless compression is essential for diagnostic purposes.
- Digital Cinema: Efficient compression without compromising image quality is critical for movie distribution.
- Satellite and Remote Sensing: Handles large images with high resolutions effectively.
- Archival Storage: Preservation of digital images with minimal loss in quality.
Objectives of this Notebook¶
What You Will Learn
- Understand the Core Components: Gain a comprehensive understanding of the JPEG 2000 compression pipeline, including pre-processing, transformation, quantization, and encoding.
- Explore the Discrete Wavelet Transform: Learn how DWT differs from other transforms and why it is advantageous for image compression.
- Implement Key Algorithms: Use Python to implement and experiment with the main algorithms used in JPEG 2000.
- Visualize Compression Effects: Observe how different parameters affect image quality and compression efficiency through interactive visualizations.
- Customize for Specific Data: Learn how to adapt and optimize the JPEG 2000 algorithm for different types of images and requirements.
How the Notebook is Structured¶
- Section 1-2: Introduce fundamental concepts and pre-processing steps to set the stage for the core compression techniques.
- Section 3-5: Delve into the core components of JPEG 2000, providing both theoretical background and practical implementations.
- Section 6-7: Discuss the reconstruction process and evaluate the performance of the compression.
- Section 8-9: Explore advanced topics and customization options to extend your understanding and application of JPEG 2000.
- Interactivity and Modularity: Throughout the notebook, interactive elements allow you to experiment with parameters and observe outcomes in real-time, and modular sections enable focused learning on topics of interest.
By the end of this notebook, you will have a solid understanding of how JPEG 2000 works and be equipped with practical skills to implement and customize the algorithm for various applications.
1. Fundamentals of JPEG 2000¶
1.1 The Compression Pipeline¶
JPEG 2000 introduces a sophisticated compression pipeline designed to efficiently encode images while preserving quality. The high-level stages of the pipeline are:
- Pre-processing
- Image Tiling: Splits the image into smaller, manageable tiles to reduce memory usage during processing.
- Color Space Transformation: Converts RGB color space to a luminance-chrominance color space (like YCbCr) for better compression of color images.
Discrete Wavelet Transform (DWT) Transforms the image data into wavelet coefficients, decomposing it into multiple frequency bands for efficient encoding.
Quantization Reduces the precision of the wavelet coefficients to achieve compression by eliminating less significant data.
Tier-1 Coding (Bit-plane Coding) Encodes the quantized coefficients using Embedded Block Coding with Optimal Truncation (EBCOT), applying context modeling and arithmetic coding.
Tier-2 Coding (Packetization) Organizes the encoded data into packets, enabling features like scalability and progressive decoding.
File Formatting Assembles the codestream with necessary headers and markers to create the final JPEG 2000 file.
Figure 1 illustrates the high-level flow of the JPEG 2000 compression process:
$${\color{red}ToDo: add \space the \space image}$$
1.2 Key Differences from JPEG¶
While both JPEG and JPEG 2000 are image compression standards, JPEG 2000 introduces significant advancements over the JPEG format.
Technical Advancements
- Wavelet-Based Transformation
- JPEG uses the Discrete Cosine Transform (DCT), which operates on fixed-size blocks.
- JPEG 2000 employs the Discrete Wavelet Transform (DWT), providing better spatial and frequency localization without block artifacts.
- Enhanced Quantization and Coding
- JPEG uses uniform quantization and Huffman coding.
- JPEG 2000 utilizes more advanced quantization techniques and arithmetic coding, improving compression efficiency.
- Scalability and Progressive Transmission
- JPEG supports limited progressive modes.
- JPEG 2000 offers full scalability in resolution and quality, allowing for progressive transmission and display.
- Region of Interest (ROI) Coding
- JPEG does not support ROI coding.
- JPEG 2000 allows specific regions to be encoded with higher quality without significantly increasing file size.
- Lossless Compression Support
- JPEG is primarily lossy.
- JPEG 2000 supports both lossy and true lossless compression within the same framework.
Benefits in Quality and Compression Ratio
- Improved Image Quality at High Compression Ratios: JPEG 2000 delivers better visual quality, especially at lower bitrates, reducing artifacts commonly seen in JPEG images.
- No Blocking Artifacts: Eliminates the blockiness associated with JPEG due to its block-based DCT approach.
- Flexible File Format: The codestream syntax allows for features like random access and easy editing, which are not feasible with JPEG.
Understanding these fundamental differences sets the stage for a deeper exploration of JPEG 2000’s components and how they contribute to its superior performance in image compression.
2. Pre-processing Steps¶
In the JPEG 2000 compression pipeline, pre-processing is essential to prepare the image data for efficient compression. This section covers the key pre-processing steps:
- Image Tiling
- Color Space Transformation
- Sample Data Preparation
2.1 Image Tiling¶
Concept and Purpose of Tiling
Image tiling involves dividing an image into smaller rectangular blocks called tiles. Each tile is processed independently through the compression pipeline. Tiling offers several benefits:
- Memory Efficiency: Processing smaller tiles reduces memory usage, which is crucial for large images.
- Parallel Processing: Tiles can be processed in parallel, leveraging multi-core processors for faster computations.
- Localized Processing: Enhances caching efficiency and allows for localized image processing.
By default, if tiling is not applied, the entire image is treated as a single tile.
Effects on Memory and Processing
- Reduced Memory Footprint: Smaller tiles mean less data in memory at any given time.
- Processing Overhead: Excessive tiling may introduce overhead and reduce compression efficiency due to the loss of correlation between tiles.
- Edge Artifacts: Proper handling is required to prevent artifacts at tile boundaries.
Python Implementation Let’s implement a function to divide an image into tiles.
import numpy as np
def image_tiling(image, tile_size):
"""
Splits an image into tiles of specified size.
Parameters:
- image: numpy.ndarray
The input image array (H x W x C).
- tile_size: tuple
A tuple (tile_height, tile_width) specifying the size of each tile.
Returns:
- tiles: list of numpy.ndarray
A list containing the image tiles.
"""
tiles = []
img_height, img_width = image.shape[:2]
tile_height, tile_width = tile_size
# Calculate the number of tiles needed in each dimension
n_tiles_y = (img_height + tile_height - 1) // tile_height
n_tiles_x = (img_width + tile_width - 1) // tile_width
for i in range(n_tiles_y):
for j in range(n_tiles_x):
# Calculate the start and end indices for the current tile
y_start = i * tile_height
y_end = min((i + 1) * tile_height, img_height)
x_start = j * tile_width
x_end = min((j + 1) * tile_width, img_width)
# Extract the tile from the image
tile = image[y_start:y_end, x_start:x_end, :]
tiles.append(tile)
return tiles
import numpy as np
import matplotlib.pyplot as plt
from skimage import data
# Load an example image
image = data.astronaut() # Returns an RGB image
# Print the size of the original image
print(f"Original Image | Shape: {image.shape}, Range: [{image.min()}, {image.max()}]")
# Define tile size (e.g., 256x256 pixels)
tile_size = (256, 256)
# Apply image tiling
tiles = image_tiling(image, tile_size)
# Calculate the number of tiles in each dimension
img_height, img_width = image.shape[:2]
tile_height, tile_width = tile_size
n_tiles_y = (img_height + tile_height - 1) // tile_height
n_tiles_x = (img_width + tile_width - 1) // tile_width
# Display the tiles in a grid with tile numbers
fig, axes = plt.subplots(n_tiles_y, n_tiles_x, figsize=(10, 10))
for i in range(n_tiles_y):
for j in range(n_tiles_x):
tile_index = i * n_tiles_x + j
if tile_index < len(tiles):
axes[i, j].imshow(tiles[tile_index])
axes[i, j].axis('off')
# Add tile number as a subtitle
axes[i, j].set_title(f'Tile {tile_index + 1}', fontsize=8)
else:
axes[i, j].axis('off') # Hide unused subplots if any
plt.suptitle('Image Tiles')
plt.show()
Original Image | Shape: (512, 512, 3), Range: [0, 255]
2.2 Color Space Transformation¶
From RGB to YCbCr
Color images are commonly represented in the RGB (Red, Green, Blue) color space. For compression, it’s beneficial to transform the image into a color space that separates luminance from chrominance components, such as YCbCr.
- Y: Luminance (brightness information)
- Cb: Chrominance-blue (color difference between blue and luminance)
- Cr: Chrominance-red (color difference between red and luminance)
This transformation takes advantage of the human visual system’s higher sensitivity to luminance than chrominance, allowing for better compression by allocating more bits to luminance information.
Why Color Transformation is Used?
- Compression Efficiency: Separating luminance and chrominance enables differential compression strategies.
- Perceptual Optimization: Aligns with human vision, reducing noticeable artifacts.
Python Implementation: Here is a Python function to convert between RGB and YCbCr color spaces based on the ITU-R BT.601 standard.
def rgb_to_ycbcr(image):
"""
Converts an RGB image to YCbCr color space using integer arithmetic.
Parameters:
- image: numpy.ndarray
The input RGB image array (H x W x 3), with values in [0, 255].
Returns:
- ycbcr_image: numpy.ndarray
The image in YCbCr color space, with values in standard YCbCr ranges.
"""
# Ensure the input image is in the correct type
image = image.astype(np.uint8)
# Define the transformation matrix (no scaling)
transformation_matrix = np.array([
[ 0.29900, 0.58700, 0.11400],
[ -0.16874, -0.33126, 0.50000],
[ 0.50000, -0.41869, -0.08131]
])
# Convert image to float for calculation
image = image.astype(np.float32)
# Apply the transformation
ycbcr_image = np.dot(image, transformation_matrix.T)
# Add offsets
ycbcr_image[:, :, 0] += 16.0 # Y channel
ycbcr_image[:, :, 1] += 128.0 # Cb channel
ycbcr_image[:, :, 2] += 128.0 # Cr channel
# Clip values to valid range and convert back to uint8
ycbcr_image = np.clip(ycbcr_image, 0, 255).astype(np.uint8)
return ycbcr_image
def ycbcr_to_rgb(ycbcr_image):
"""
Converts a YCbCr image back to RGB color space.
Parameters:
- ycbcr_image: numpy.ndarray
The input YCbCr image array (H x W x 3), with values in standard ranges.
Returns:
- rgb_image: numpy.ndarray
The image in RGB color space, with values in [0, 255].
"""
# Convert image to float for calculation
ycbcr_image = ycbcr_image.astype(np.float32)
# Remove offsets
ycbcr_image[:, :, 0] -= 16.0 # Y channel
ycbcr_image[:, :, 1] -= 128.0 # Cb channel
ycbcr_image[:, :, 2] -= 128.0 # Cr channel
# Define the inverse transformation matrix
inverse_matrix = np.array([
[1.0, 0.0, 1.402],
[1.0, -0.34414, -0.71414],
[1.0, 1.772, 0.0]
])
# Apply the inverse transformation
rgb_image = np.dot(ycbcr_image, inverse_matrix.T)
# Clip values to valid range and convert back to uint8
rgb_image = np.clip(rgb_image, 0, 255).astype(np.uint8)
return rgb_image
image = tiles[0]
# Convert the image to YCbCr color space
ycbcr_image = rgb_to_ycbcr(image)
print(f"YCbCr Image | Shape: {ycbcr_image.shape}, Range: [{ycbcr_image.min()}, {ycbcr_image.max()}]")
# Display the Y, Cb, and Cr components separately
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
axes[0].imshow(ycbcr_image[:, :, 0], cmap='gray')
axes[0].axis('off')
axes[0].set_title('Y Component (Luminance)')
axes[1].imshow(ycbcr_image[:, :, 1], cmap='gray')
axes[1].axis('off')
axes[1].set_title('Cb Component (Blue Chrominance)')
axes[2].imshow(ycbcr_image[:, :, 2], cmap='gray')
axes[2].axis('off')
axes[2].set_title('Cr Component (Red Chrominance)')
plt.suptitle('YCbCr Components')
plt.show()
# Convert the YCbCr image back to RGB color space
reconstructed_rgb_image = ycbcr_to_rgb(ycbcr_image)
# Display the original and reconstructed images side by side
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].imshow(image)
axes[0].axis('off')
axes[0].set_title('Original RGB Image')
axes[1].imshow(reconstructed_rgb_image)
axes[1].axis('off')
axes[1].set_title('Reconstructed RGB Image from YCbCr')
plt.suptitle('RGB to YCbCr and Back Conversion')
plt.show()
YCbCr Image | Shape: (256, 256, 3), Range: [16, 255]
2.3 Sample Data Preparation¶
Normalization and Data Types
Before applying the wavelet transform and subsequent compression steps, the image data must be prepared:
- Normalization (optional): Scaling pixel values to a standard range (e.g., [-0.5, 0.5]) ensures numerical stability and consistency.
- Data Types: Converting data to float32 for precise computations.
Padding and Edge Handling
Processing techniques like wavelet transforms may require the image dimensions to be multiples of certain values.
- Padding: Extends the image dimensions to the nearest multiple required by the processing algorithm.
- Edge Handling: Methods include zero-padding, symmetric padding, or constant values to handle image borders.
Python Implementation: Let's implement simple versions of these preprocessing steps
def sample_data_preparation(image, normalize_range=(-0.5, 0.5)):
"""
Prepares image data by normalizing and converting data types.
Parameters:
- image: numpy.ndarray
The input image array (H x W).
Returns:
- prepared_image: numpy.ndarray
The processed image array.
"""
# Convert to float32
image = image.astype(np.float32)
# Scale pixel values to [0, 1] if they are in [0, 255]
if image.max() > 1.0:
image /= 255.0
# Normalize to desired range
min_val, max_val = normalize_range
prepared_image = image * (max_val - min_val) + min_val
return prepared_image
def pad_image(image, block_size):
"""
Pads the image so its dimensions are multiples of block_size.
Parameters:
- image: numpy.ndarray
The input image array (H x W).
- block_size: int
The block size to pad to.
Returns:
- padded_image: numpy.ndarray
The padded image array.
"""
height, width = image.shape[:2]
pad_height = (block_size - (height % block_size)) % block_size
pad_width = (block_size - (width % block_size)) % block_size
padded_image = np.pad(
image,
((0, pad_height), (0, pad_width)),
mode='reflect' # Using symmetric padding
)
return padded_image
# Extract the Y (luminance) component to show how the preparation steps work
Y_component = ycbcr_image[:, :, 0]
# Step 1: Display the original Y component with its range and size
print(f"Original Y Component | Shape: {Y_component.shape}, Range: [{Y_component.min():.2f}, {Y_component.max():.2f}]")
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.imshow(Y_component, cmap='gray')
plt.axis('off')
plt.title('Original Y Component')
# Step 2: Prepare the Y component (normalize to range [-0.5, 0.5])
#normalize_range = (-0.5, 0.5)
#Y_component = sample_data_preparation(Y_component, normalize_range=normalize_range)
# # Print details after normalization
# print(f"Normalized Y Component | Shape: {Y_component.shape}, Range: [{Y_component.min():.2f}, {Y_component.max():.2f}]")
# plt.subplot(1, 3, 2)
# plt.imshow(Y_component, cmap='gray')
# plt.axis('off')
# plt.title(f'Normalized Y Component\nRange {normalize_range}')
# Step 3: Pad the image to be a multiple of the block size
block_size = 20 # Note that 20 is not a good choice, use multiples of 2
Y_padded = pad_image(Y_component, block_size)
# Print details after padding
print(f"Padded Y Component | Shape: {Y_padded.shape}, Range: [{Y_padded.min():.2f}, {Y_padded.max():.2f}]")
plt.subplot(1, 2, 2)
plt.imshow(Y_padded, cmap='gray')
plt.axis('off')
plt.title(f'Padded Y Component\n(Block Size {block_size}x{block_size})')
plt.suptitle('Data Preparation and Padding Steps')
plt.show()
Original Y Component | Shape: (256, 256), Range: [16.00, 255.00] Padded Y Component | Shape: (260, 260), Range: [16.00, 255.00]
3. Discrete Wavelet Transform (DWT)¶
The Discrete Wavelet Transform (DWT) is a core component of the JPEG 2000 compression algorithm. It transforms the spatial domain image data into a frequency domain representation, enabling efficient compression by concentrating energy into a few significant coefficients.
3.1 Introduction to Wavelets¶
What are Wavelets?
Wavelets are mathematical functions that divide data into different frequency components and analyze each component with a resolution matched to its scale. Unlike traditional Fourier transforms, which use sinusoids extending over infinite time, wavelets are localized in both time (or space) and frequency.
Key Characteristics:
- Localization: Wavelets are localized in space and frequency, making them suitable for analyzing transient or non-stationary signals.
- Multi-Resolution Analysis: Wavelets provide a hierarchical framework, allowing analysis at various levels of detail.
Why Use Wavelets in Image Compression?
- Efficient Energy Compaction: Wavelets concentrate most of the signal’s energy into a few large coefficients, making it easier to compress data effectively.
- No Blocking Artifacts: Unlike block-based transforms (e.g., DCT in JPEG), wavelets do not introduce blocking artifacts.
- Scalability: Wavelet transforms enable progressive transmission and decoding, supporting features like resolution scalability.
Comparison with Fourier Transform and DCT
| Feature | Fourier Transform | Discrete Cosine Transform (DCT) | Discrete Wavelet Transform (DWT) |
|---|---|---|---|
| Localization | Frequency only | Frequency only (block-based) | Both space and frequency |
| Basis Functions | Sinusoids | Cosine functions | Scaled and shifted wavelets |
| Block Artifacts | N/A | May occur due to block processing | None |
| Multi-Resolution Analysis | No | Limited (via block sizes) | Yes |
3.2 DWT in JPEG 2000¶
Types of Wavelets Used: JPEG 2000 employs two main types of wavelet filters:
- Daubechies 9/7 Tap Filters (Biorthogonal 9/7): Used for lossy compression due to their excellent energy compaction properties.
- Le Gall 5/3 Tap Filters (Biorthogonal 5/3): Used for lossless compression since they can be implemented with integer operations.
Multi-Level Decomposition and Sub-Bands: The DWT decomposes the image into multiple levels, each consisting of four sub-bands:
- LL (Low-Low): Approximation coefficients (low frequency in both dimensions).
- LH (Low-High): Horizontal details (low frequency vertically, high frequency horizontally).
- HL (High-Low): Vertical details (high frequency vertically, low frequency horizontally).
- HH (High-High): Diagonal details (high frequency in both dimensions).
Each subsequent level operates on the LL sub-band from the previous level, allowing for a hierarchical representation.
Figure 2: Multi-Level DWT Decomposition
$${\color{red} ToDo }$$ Insert a diagram showing multi-level DWT decomposition.
Applying DWT in JPEG 2000
- Forward Transform: The image data is passed through a series of low-pass and high-pass filters, followed by downsampling, to produce the wavelet coefficients.
- Number of Levels: The number of decomposition levels depends on the image size and desired compression ratio.
- Edge Handling: Symmetric extension is typically used to handle image borders, avoiding artifacts.
3.3 Python Implementation¶
We’ll run the DWT using both the PyWavelets library and a custom implementation for educational purposes. First lets use PyWavelets libary. In this library,
- For Lossy Compression: Use 'bior4.4' (approximates the Daubechies 9/7 wavelet).
- For Lossless Compression: Use 'bior2.2' (approximates the Le Gall 5/3 wavelet).
Example Usage
You can use the pre-processed image tile Y_padded from the previous section or as I do here, let's generate a new image and apply wavelet on it:
import pywt
import matplotlib.pylab as plt
from utils import visualize_coeffs, visualize_coeffs_2D, create_pattern_image
#image = pywt.data.camera()
#image = create_pattern_image(save_dir='.', image_width=512, image_height=800, grayscale=True)
image = Y_padded
base_figsize = 4
figsize = (base_figsize, base_figsize)
plt.figure(figsize=figsize)
plt.imshow(image, cmap='gray')
plt.title(f'Input image')
plt.axis('off')
plt.show()
# Perform multi-level 2D Discrete Wavelet Transform
# Set number of levels for decomposition
n_levels = 1
#The type of wavelet to use (e.g., 'bior4.4' for 9/7, 'bior2.2' for 5/3).
wavelet = 'bior4.4'
coefficients = pywt.wavedec2(image, wavelet, level=n_levels)
visualize_coeffs(coefficients, n_levels, image.shape, base_figsize)
When visualizing these coefficients, we usually arrange the subbands in a 2D array the same size as the input image in which:
- LL (Low-Low) is positioned in the top left corner. This sub-band contains the low-frequency (coarse) components of the image, representing the overall average information—a smaller, smoothed version of the original image.
- HH (High-High) occupies the bottom right corner. This sub-band, known as the Diagonal Detail Coefficients, captures diagonal edges and details, focusing on variations along both rows and columns.
- HL (High-Low) is placed in the top right corner. The Horizontal Detail Coefficients capture high-frequency components in the horizontal direction, highlighting vertical edges and details.
- LH (Low-High) appears in the bottom left corner. These Vertical Detail Coefficients capture high-frequency components in the vertical direction, emphasizing horizontal edges and details.
By visualizing the coefficients in this layout, we gain insight into how different levels of detail and directional features are preserved and separated by the wavelet transform, enabling targeted compression of image components based on their perceptual significance. Let's look them using a helper function:
visualize_coeffs_2D(coefficients, n_levels)
# repeat with more levels in DWT
base_figsize = 4
figsize = (base_figsize, base_figsize)
plt.figure(figsize=figsize)
plt.imshow(image, cmap='gray')
plt.title(f'Input image')
plt.axis('off')
plt.show()
# Perform multi-level 2D Discrete Wavelet Transform
# Set number of levels for decomposition
n_levels = 2
#The type of wavelet to use (e.g., 'bior4.4' for 9/7, 'bior2.2' for 5/3).
wavelet = 'bior4.4'
coefficients = pywt.wavedec2(image, wavelet, level=n_levels)
# visualize_coeffs(coefficients, n_levels, image.shape, base_figsize)
visualize_coeffs_2D(coefficients, n_levels)
# repeat with a different image
image = create_pattern_image(save_dir='.', image_width=1024, image_height=1024, grayscale=True)
base_figsize = 8
figsize = (base_figsize, base_figsize)
plt.figure(figsize=figsize)
plt.imshow(image, cmap='gray')
plt.title(f'Input image')
plt.axis('off')
plt.show()
# Perform multi-level 2D Discrete Wavelet Transform
# Set number of levels for decomposition
n_levels = 2
#The type of wavelet to use (e.g., 'bior4.4' for 9/7, 'bior2.2' for 5/3).
wavelet = 'bior4.4'
coefficients = pywt.wavedec2(image, wavelet, level=n_levels)
# visualize_coeffs(coefficients, n_levels, image.shape, base_figsize)
visualize_coeffs_2D(coefficients, n_levels)
4. Quantization¶
After obtaining the wavelet coefficients from the DWT, the next step in the JPEG 2000 compression pipeline is quantization. Quantization reduces the precision of the coefficients to achieve compression by discarding less significant information.
4.1 Purpose of Quantization in Compression¶
Quantization is crucial for reducing the amount of data required to represent the image:
- Compression Efficiency: By reducing the number of bits needed to represent the coefficients, quantization contributes significantly to the overall compression ratio.
- Perceptual Optimization: Less critical visual information can be quantized more aggressively without noticeably affecting image quality.
- Trade-off Control: Adjusting quantization levels allows control over the balance between compression ratio and image quality.
4.2 Quantization Process in JPEG 2000¶
JPEG 2000 uses scalar quantization with a dead zone around zero:
- Scalar Quantization: Each wavelet coefficient is divided by a quantization step size and then rounded to the nearest integer.
- Dead-Zone Quantization: Coefficients near zero are more aggressively quantized, effectively setting small coefficients to zero, which enhances compression.
The quantization process for a coefficient c is:
$$ q = \text{sign}(c) \times \left\lfloor \frac{|c|}{\Delta} + 0.5 \right\rfloor $$
$ q $ : Quantized coefficient
$ \Delta $ : Quantization step size.
- Higher Quantization Step Size: Leads to more coefficients being zeroed, resulting in higher compression but potential loss of image quality.
- Lower Quantization Step Size: Preserves more details, leading to better image quality but lower compression.
+0.5 effectively makes it possible to rounds values to the nearest integer (instead of alwyas rounding down)
4.3 Python Implementation¶
Function: Quantization of Wavelet Coefficients
def quantize(coefficient, step_size):
"""
Performs scalar quantization on a coefficient matrix.
Parameters:
- coefficient: numpy.ndarray
The input matrix of coefficients to be quantized.
- step_size: float
The quantization step size.
Returns:
- quantized: numpy.ndarray
The quantized coefficient matrix.
"""
# + 0.5 effectively makes it possible to rounds values to the nearest integer (instead of alwyas rounding down)
quantized = np.sign(coefficient) * np.floor(np.abs(coefficient) / step_size + 0.5)
return quantized.astype(np.int64)
def quantize_coeffs(coeffs, step_sizes, quantize_LL=True):
"""
Quantizes the wavelet coefficients using scalar quantization.
Parameters:
- coeffs: list
Wavelet coefficients from dwt2d function.
- step_sizes: list or float
Quantization step sizes for each sub-band or a single value for all.
- quantize_LL: bool
If True, quantize approximation coefficients for lossy compression.
Returns:
- quantized_coeffs: list
Quantized wavelet coefficients.
"""
quantized_coeffs = []
# Quantize the approximation coefficients if quantize_LL is True
if quantize_LL:
# Use the first step size for LL if a list is provided, otherwise use the single step size
step_size_LL = step_sizes[0] if isinstance(step_sizes, list) else step_sizes
qA = quantize(coeffs[0], step_size_LL)
quantized_coeffs.append(qA)
else:
# Keep LL coefficients unquantized for lossless compression
quantized_coeffs.append(coeffs[0])
# If step_sizes is a single value, replicate it for all detail coefficients
if isinstance(step_sizes, (float, int)):
step_sizes = [step_sizes] * len(coeffs[1:])
# Quantize each detail coefficient at each level
for idx, detail_coeffs in enumerate(coeffs[1:]):
step_size = step_sizes[idx] if idx < len(step_sizes) else step_sizes[-1]
qH = quantize(detail_coeffs[0], step_size)
qV = quantize(detail_coeffs[1], step_size)
qD = quantize(detail_coeffs[2], step_size)
quantized_coeffs.append((qH, qV, qD))
return quantized_coeffs
from utils import count_zeros_in_coeffs
# image = pywt.data.camera()
image = Y_padded
#image = sample_data_preparation(image)
print(f"{image.min() = }, {image.max()= }")
n_levels = 1
wavelet = 'bior4.4'
coefficients = pywt.wavedec2(image, wavelet, level=n_levels)
step_size = 20
coefficients_quantized = quantize_coeffs(coefficients, step_sizes=step_size)
visualize_coeffs_2D(coefficients, n_levels, title_suffix="")
visualize_coeffs_2D(coefficients_quantized, n_levels, title_suffix="- after quantization")
zero_counts_before = count_zeros_in_coeffs(coefficients)
zero_counts_after = count_zeros_in_coeffs(coefficients_quantized)
print(f"{coefficients[0].min() = } {coefficients[0].max() = }")
print(f"{coefficients_quantized[0].min() = } {coefficients_quantized[0].max() = }")
print(f"{zero_counts_before = }")
print(f"{zero_counts_after = }")
image.min() = np.uint8(16), image.max()= np.uint8(255)
coefficients[0].min() = np.float64(17.10717075427084) coefficients[0].max() = np.float64(514.9584387485997)
coefficients_quantized[0].min() = np.int64(1) coefficients_quantized[0].max() = np.int64(26)
zero_counts_before = {'LL': {'count': 0, 'percentage': 0.0}, 'LH': [{'count': 0, 'percentage': 0.0}], 'HL': [{'count': 0, 'percentage': 0.0}], 'HH': [{'count': 0, 'percentage': 0.0}], 'total': {'count': 0, 'percentage': 0.0}}
zero_counts_after = {'LL': {'count': 0, 'percentage': 0.0}, 'LH': [{'count': 16504, 'percentage': 91.91356649587881}], 'HL': [{'count': 16749, 'percentage': 93.27801292047226}], 'HH': [{'count': 17420, 'percentage': 97.01492537313433}], 'total': {'count': 50673, 'percentage': 70.55162619737135}}
coefficients_quantized
[array([[21, 22, 23, ..., 22, 22, 22],
[13, 16, 19, ..., 22, 22, 22],
[ 8, 13, 17, ..., 22, 22, 22],
...,
[ 6, 6, 6, ..., 4, 6, 2],
[ 6, 6, 6, ..., 4, 6, 2],
[ 6, 6, 6, ..., 3, 5, 2]]),
(array([[ 0, 0, 0, ..., 0, 0, 0],
[ 1, 1, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., -1, -1, 0],
[ 0, 0, 0, ..., 0, 0, 0]]),
array([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, -1, 0, ..., 0, 0, 0],
[ 1, -1, 0, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., -1, 0, 0],
[ 0, 0, 0, ..., -1, 0, 0],
[ 0, 0, 0, ..., -1, 0, 0]]),
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]))]
Let's try to reconstruct the images after the quantization and visualize the differences.
import numpy as np
import matplotlib.pyplot as plt
import pywt
def dequantize(quantized, step_size):
"""
Dequantizes a quantized coefficient matrix.
Parameters:
- quantized: numpy.ndarray
The quantized coefficient matrix.
- step_size: float
The quantization step size.
Returns:
- dequantized: numpy.ndarray
The dequantized coefficient matrix.
"""
dequantized = quantized * step_size
return dequantized
def dequantize_coeffs(quantized_coeffs, step_sizes, quantize_LL=True):
"""
Dequantizes the quantized wavelet coefficients.
Parameters:
- quantized_coeffs: list
Quantized wavelet coefficients.
- step_sizes: list or float
Quantization step sizes used during quantization.
- quantize_LL: bool
If True, dequantize approximation coefficients.
Returns:
- dequantized_coeffs: list
Dequantized wavelet coefficients.
"""
dequantized_coeffs = []
# Dequantize the approximation coefficients if quantize_LL is True
if quantize_LL:
step_size_LL = step_sizes[0] if isinstance(step_sizes, list) else step_sizes
dA = dequantize(quantized_coeffs[0], step_size_LL)
dequantized_coeffs.append(dA)
else:
dequantized_coeffs.append(quantized_coeffs[0])
# If step_sizes is a single value, replicate it for all detail coefficients
if isinstance(step_sizes, (float, int)):
step_sizes = [step_sizes] * (len(quantized_coeffs) - 1)
# Dequantize each detail coefficient at each level
for idx, quantized_detail_coeffs in enumerate(quantized_coeffs[1:]):
step_size = step_sizes[idx] if idx < len(step_sizes) else step_sizes[-1]
qH, qV, qD = quantized_detail_coeffs
dH = dequantize(qH, step_size)
dV = dequantize(qV, step_size)
dD = dequantize(qD, step_size)
dequantized_coeffs.append((dH, dV, dD))
return dequantized_coeffs
from utils import visualize_reconstruction
def reconstruct_component(component_coeffs_quantized, step_sizes, quantize_LL):
dequantized_coeffs = dequantize_coeffs(component_coeffs_quantized, step_sizes, quantize_LL=quantize_LL)
# Ensure coefficients are in float format for inverse DWT
dequantized_coeffs_float = []
for coeff_level in dequantized_coeffs:
if isinstance(coeff_level, tuple):
# Detail coefficients are tuples
coeffs = tuple([coeff.astype(np.float32) for coeff in coeff_level])
else:
# Approximation coefficients
coeffs = coeff_level.astype(np.float32)
dequantized_coeffs_float.append(coeffs)
# Apply inverse DWT
component_reconstructed = pywt.waverec2(dequantized_coeffs_float, wavelet)
return component_reconstructed
def reconstruct_image(coefficients_quantized, wavelet, step_sizes, quantize_LL=True):
"""
Reconstructs the image from quantized coefficients, applies dequantization, inverse DWT, and color transform.
Parameters:
- coefficients_quantized: list of lists
A list containing the quantized coefficients for each color component (Y, Cb, Cr).
- wavelet: str
The wavelet used for the inverse DWT.
- step_sizes: list or float
Quantization step sizes used during quantization.
- quantize_LL: bool
Whether the LL sub-band was quantized.
Returns:
- reconstructed_image: numpy.ndarray
The reconstructed RGB image.
"""
# List to hold reconstructed components
reconstructed_components = []
# Process each color component separately
for component_coeffs_quantized in coefficients_quantized:
# Apply inverse DWT
component_reconstructed = reconstruct_component(component_coeffs_quantized, step_sizes, quantize_LL)
reconstructed_components.append(component_reconstructed)
# Stack the components to form YCbCr image
reconstructed_ycbcr = np.stack(reconstructed_components, axis=2)
# Convert YCbCr to RGB
reconstructed_image = ycbcr_to_rgb(reconstructed_ycbcr)
return reconstructed_image
# Visualize the original and reconstructed images
reconstructed_image = reconstruct_component(coefficients_quantized, step_sizes=step_size, quantize_LL=True)
visualize_reconstruction(image, reconstructed_image)
diff = image - reconstructed_image
print(diff.max())
17.510971
# test the entire reconstruction code
# Load the original image (RGB)
from skimage import data
original_image = data.astronaut()
# Convert the image to YCbCr color space
ycbcr_image = rgb_to_ycbcr(original_image)
# Separate the components
Y_component = ycbcr_image[:, :, 0]
Cb_component = ycbcr_image[:, :, 1]
Cr_component = ycbcr_image[:, :, 2]
# Perform DWT on each component
wavelet = 'bior4.4' # Should match the wavelet used during quantization
n_levels = 3
coefficients_Y = pywt.wavedec2(Y_component, wavelet=wavelet, level=n_levels)
coefficients_Cb = pywt.wavedec2(Cb_component, wavelet=wavelet, level=n_levels)
coefficients_Cr = pywt.wavedec2(Cr_component, wavelet=wavelet, level=n_levels)
# Quantize the coefficients (you may have already done this)
step_size = 20 # Adjust the step size as needed
coefficients_quantized_Y = quantize_coeffs(coefficients_Y, step_sizes=step_size)
coefficients_quantized_Cb = quantize_coeffs(coefficients_Cb, step_sizes=step_size)
coefficients_quantized_Cr = quantize_coeffs(coefficients_Cr, step_sizes=step_size)
# Collect quantized coefficients into a list
coefficients_quantized = [coefficients_quantized_Y, coefficients_quantized_Cb, coefficients_quantized_Cr]
# Reconstruct the image
# Reconstruct the image with dequantization
reconstructed_image = reconstruct_image(coefficients_quantized, wavelet, step_size, quantize_LL=True)
# Visualize the original and reconstructed images
visualize_reconstruction(original_image, reconstructed_image)
print(f"{original_image.min() = } {original_image.max() = }")
print(f"{reconstructed_image.min() = } {reconstructed_image.max() = }")
original_image.min() = np.uint8(0) original_image.max() = np.uint8(255) reconstructed_image.min() = np.uint8(0) reconstructed_image.max() = np.uint8(255)
5. Tier-1 Coding: Bit-plane Coding¶
After quantization, the coefficients are encoded using Embedded Block Coding with Optimal Truncation (EBCOT) in the Tier-1 coding stage. In the Tier-1 coding stage of JPEG 2000, the quantized wavelet coefficients are efficiently encoded using EBCOT. This process involves several intricate steps designed to maximize compression while allowing for progressive transmission and precise bitrate control. EBCOT encodes quantized wavelet coefficients in a way that allows for:
- Progressive Decoding: The ability to reconstruct the image at various quality levels.
- Optimized Compression: Achieving high compression ratios without significant loss of image quality.
- Bitrate Control: Precisely controlling the output bitrate or file size.
5.1 Main Components of EBCOT¶
- Codeblock Formation: Dividing the sub-bands into smaller blocks for independent processing.
- Bit-plane Coding: Encoding the coefficients bit-plane by bit-plane using three passes.
- Context Modeling: Utilizing the significance of neighboring coefficients to model probabilities.
- Arithmetic Coding: Compressing the bitstream using an arithmetic encoder.
5.2 Codeblock Formation: Dividing Sub-bands into Codeblocks¶
Each sub-band from the quantized coefficients is divided into smaller rectangular regions called codeblocks. Typical codeblock sizes are 64x64 or 32x32 coefficients.
Benefits of Codeblocks
- Error Resilience: Errors are localized to individual codeblocks.
- Parallel Processing: Codeblocks can be processed independently, allowing for parallel computation.
- Selective Refinement: Enables selective enhancement of regions during progressive decoding.
Here is a simple implementation of it
def divide_into_codeblocks(coeffs_band, block_size=(64, 64)):
"""
Divides a coefficient band into codeblocks.
Parameters:
- coeffs_band: numpy.ndarray
A 2D array of quantized coefficients for a band.
- block_size: tuple
The size of each codeblock (height, width).
Returns:
- codeblocks: list of numpy.ndarray
A list containing the codeblocks.
"""
blocks = []
height, width = coeffs_band.shape
block_height, block_width = block_size
for i in range(0, height, block_height):
for j in range(0, width, block_width):
block = coeffs_band[i:i+block_height, j:j+block_width]
blocks.append(block)
#print(block.shape)
return blocks
# Assuming 'quantized_coeffs' is from previous steps
band = coefficients_quantized[-1][0] # Horizontal detail coefficients of the last level
codeblocks = divide_into_codeblocks(band)
print(codeblocks[-2])
[[ 0 0 0 1 0 0 -2 5 -8 3 -3 9 -4 -8 -8 5 0 1
-1 -1 0 4 7 -23 -22 15 5 -30 12 6 -3 2 12 -17 -9 4
3 3 -6 -3 2 4 5 2 -26 -15 -10 4 2 -14 -12 4 31 17
2 6 3 6 5 9 5 2 -1 1]
[ 1 -7 2 -2 -1 -2 2 -8 9 2 -8 -18 12 8 17 -8 0 -1
-1 1 -10 -16 -4 8 26 -10 -4 34 10 -26 -7 1 -6 4 55 22
4 -1 7 11 3 -5 1 7 13 14 4 1 1 -1 8 3 -12 -58
-51 -15 -13 -6 -1 -2 -5 -8 -14 3]
[ 0 0 0 1 1 0 -2 5 -6 3 -4 11 -4 -10 -9 4 0 1
-1 -1 1 3 6 -23 -16 11 4 -30 12 6 0 3 12 -16 -12 -2
3 3 -6 -3 2 3 0 0 -26 -15 -11 1 1 -14 -12 4 31 17
2 6 4 5 5 9 5 1 -1 0]
[ 2 -1 -1 -1 -3 1 2 -4 16 -1 9 -9 1 17 -2 4 -1 -1
0 -1 1 -10 12 2 -6 -21 24 5 -3 -1 -2 -17 0 5 1 1
-3 2 2 -1 -2 0 0 29 14 -2 5 16 22 10 0 -5 -4 0
-1 -2 0 -2 -3 -2 -1 0 1 2]
[ -3 1 0 -1 3 0 0 2 -17 2 -7 4 -2 -19 10 -8 2 0
0 2 -2 9 -22 8 18 15 -38 12 4 -2 1 19 -11 -4 6 1
2 -5 0 3 2 -1 1 -35 -1 17 4 -20 -27 -1 15 14 -2 -8
-1 -1 -3 -1 1 -3 -3 -3 -4 -3]
[ 0 3 -1 2 -1 1 -1 1 3 -5 6 3 -1 7 -8 3 -1 0
1 0 5 5 6 9 -9 0 12 -4 -17 11 4 -5 2 14 -23 -12
-4 1 1 -5 -3 1 -1 2 6 -7 0 3 4 6 -6 -14 -18 21
26 4 6 1 -2 -3 1 5 11 0]]
5.3 Bit-plane Coding Techniques¶
Each codeblock undergoes bit-plane coding, where coefficients are processed bit by bit from the most significant bit (MSB) to the least significant bit (LSB).
Three Coding Passes per Bit-plane
- Significance Propagation Pass
- Magnitude Refinement Pass
- Cleanup Pass
1. Significance Propagation Pass
- Objective: Identify new significant coefficients (coefficients that become non-zero at the current bit-plane) that have significant neighbors.
- Processing: For each coefficient that is not yet significant but has significant neighbors, encode its significance and sign.
2. Magnitude Refinement Pass
- Objective: Refine the magnitude bits of coefficients that became significant in previous bit-planes.
- Processing: Output the bit at the current bit-plane position for these coefficients.
3. Cleanup Pass
- Objective: Process the remaining coefficients that were not handled in the previous two passes.
- Processing: Encode significance and sign for any new significant coefficients and output zero bits for insignificant ones.
5.4 Context Modeling¶
Context modeling uses the significance states of neighboring coefficients to predict the probability of a coefficient becoming significant.
Significance States
- Significant: A coefficient has become non-zero in previous bit-planes.
- Insignificant: A coefficient is still zero up to the current bit-plane.
Typically, the eight immediate neighbors are considered when determining the context.
5.5 Simplified Python Implementation¶
Implementing full EBCOT with arithmetic coding is complex. We’ll provide a simplified version that demonstrates the key concepts.
import numpy as np
import matplotlib.pyplot as plt
def visualize_bit_planes(array):
"""
Visualizes all bit planes of a 2D NumPy array, including the sign map.
Parameters:
- array: numpy.ndarray
A 2D array of integers (e.g., quantized coefficients).
Displays:
- A grid of images showing the sign map and each bit plane from MSB to LSB.
"""
# Get the sign map: 1 for negative, 0 for zero or positive
sign_map = (array < 0).astype(np.uint8)
# Get the absolute values to extract magnitude bits
abs_array = np.abs(array)
# Find the maximum value to determine the number of bits needed
max_value = abs_array.max()
# Handle the case when the entire array is zeros
if max_value == 0:
num_bits = 1
else:
num_bits = int(np.floor(np.log2(max_value))) + 1
# Prepare the subplot grid
ncols = 4 # Number of columns in the grid
nrows = int(np.ceil((num_bits + 1) / ncols)) # +1 for the sign map
fig, axes = plt.subplots(nrows, ncols, figsize=(ncols * 4, nrows * 4))
# Flatten axes array for easy indexing
axes = axes.flatten() if nrows * ncols > 1 else [axes]
# Plot the sign map
axes[0].imshow(sign_map, cmap='gray')
axes[0].set_title('Sign Map')
axes[0].axis('off')
# Loop over each bit plane from MSB to LSB
for idx, bit in enumerate(range(num_bits - 1, -1, -1)):
# Extract the bit plane
bit_plane = ((abs_array >> bit) & 1).astype(np.uint8)
# Display the bit plane
ax = axes[idx + 1] # +1 to account for the sign map
ax.imshow(bit_plane, cmap='gray')
ax.set_title(f'Bit Plane {bit}')
ax.axis('off')
# Hide any unused subplots
for idx in range(num_bits + 1, nrows * ncols):
axes[idx].axis('off')
plt.tight_layout()
plt.show()
# Assuming 'coefficients_quantized' is a list of quantized coefficients
# Let's take one of the sub-bands, for example, the horizontal detail coefficients at the first level
band = coefficients_quantized[1][0] # Replace with the appropriate index if needed
print(f"{band.dtype = } {band.shape = } {band.min() = } {band.max() = }")
visualize_bit_planes(band)
band.dtype = dtype('int64') band.shape = (134, 134) band.min() = np.int64(-58) band.max() = np.int64(55)